The following content has been provided by the University of Erlangen-Nürnberg.
Last Tuesday there were two questions and I'd like to come back to these questions right
now. The first question was about the Lagrange multiplier method in general. I looked at
the Internet and here's a document from the University of Saarland. Here's the Internet
address. You just Google for University Saarland and this title here and this a nice example
how this method actually works. We have a problem. We have a function f of two parameters
context x and y and the function computes the product of both parameters. So as a geometric
interpretation x and y are the lengths of a rectangle and you compute the area. And
you have a constraint. The constraint is given here by a function g
and it says two times x plus two times y should equal u. So the parameter should be a given
number u. And you want to find the parameters that maximize the area of your rectangle given
the parameter. There's one easy solution. It's easy because you can solve this constraint
here for x and plug it in into your function f and compute the derivative and solve it.
The other possibility which is more general is that you use the Lagrange multiplier
method. So you set up the Lagrange function which is your function f given x and y and
you add the constraint and multiply it with the Lagrange multiplier. And then you have
constraints that the derivative with respect to x, the derivative with respect to y and
the derivative with respect to the Lagrange multiplier has to be zero. That's an important
constraint and if you solve it then you get values for x and y and also so for lambda.
So you see here y has to be minus two times lambda. The same holds for x so you can conclude
that x has to be equal to y. Then you can plug it in this result into your equation and you can
get a value of x which is u divided by four. So this is the first one. So you can see that
this is how this technique in general works. This was an example for an equality constraint
but it works the same for inequality constraints. The second question was if we have the Lagrange
function for our SVM problem and it's a convex function why don't we use the optimization
techniques that were introduced before? So why don't we use descent methods? Actually
it really is a convex function and you can use descent methods. So this is the second
question. It really is a convex function and you can use descent methods. The benefit of
the other approach is that if we go to the dual problem we get rid of the parameters
alpha and alpha zero. So we get rid of a large number of parameters we just have to focus
on the Lagrange multiplier and the slack variables. So it's always better to optimize
a problem with a lower number of parameters and the descent methods in general have a problem.
We talked about the step size. You have to choose the appropriate step size and in general
these methods are quite slow. So if you can solve it in a closed solution that's always
better. However, as I was told there are really implementations that are based on descent
methods.
Here is the dual Lagrange function
and you see this function here
is only in the parameters lambda you get rid of the original parameters alpha and alpha zero.
Of course the other Lagrange
multipliers are missing here and the slack variables are missing because we are looking at the hard margin case
So yesterday you were talking about the kernel trick and what
is the message of the kernel trick? Assume we have a classification problem and we want
to separate two classes. We have examples of class plus one here and let's say we have
examples of class minus one here. Okay? And as you see this is not a linear classification
problem. You can't solve it with a linear decision boundary. So the decision boundary
has to look, should look like this. And now you want to use support vector machine techniques.
So you are looking for a band that should be as wide as possible that separates the
two areas. Okay? And the question is how can you solve this problem? You have support vector
machines. We know that we can write all, we can substitute all inner products by a kernel
Presenters
Zugänglich über
Offener Zugang
Dauer
01:29:52 Min
Aufnahmedatum
2012-12-18
Hochgeladen am
2012-12-19 10:00:19
Sprache
en-US